1 Macula Densa Project

1.1 Objectives

Three Main Goals of this File
Produce Cleaner looking code.
Identify the amount of clusters there are
Identify the top genes expressed in each of the clusters

1.2 Problems I need to Fix

Save things as RDS file so I dont have to rerun the whole code

2 Loading in Data sets + Library packages.

options(future.globals.maxSize = 74 * 1024^3) # 55 GB
getOption("future.globals.maxSize") #59055800320
## [1] 79456894976
SO5 <- LoadSeuratRds(here("jk_code", "SO5.rds"))

head(SO5@meta.data)

3 Analyzing the SO5 DATASET

DimPlot(SO5,split.by ="sample")

DimPlot(SO5)

Based off this I can see that

SO1-> control SO2 -> low_salt SO3 -> low_salt SO4 -> control

SO5 <- FindNeighbors(SO5, dims = 1:30, verbose = F)
SO5 <- FindClusters(SO5, resolution = 0.25)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 11426
## Number of edges: 376311
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8730
## Number of communities: 6
## Elapsed time: 0 seconds
SO5m <- FindAllMarkers(SO5, only.pos = TRUE)
## Calculating cluster 0
## For a (much!) faster implementation of the Wilcoxon Rank Sum Test,
## (default method for FindMarkers) please install the presto package
## --------------------------------------------
## install.packages('devtools')
## devtools::install_github('immunogenomics/presto')
## --------------------------------------------
## After installation of presto, Seurat will automatically use the more 
## efficient implementation (no further action necessary).
## This message will be shown once per session
## Calculating cluster 1
## Calculating cluster 2
## Calculating cluster 3
## Calculating cluster 4
## Calculating cluster 5
SO5m %>%
    group_by(cluster) %>%
    dplyr::filter(avg_log2FC > 1)
SO5m %>%
    group_by(cluster) %>%
    dplyr::filter(avg_log2FC > 1) %>%
    slice_head(n = 5) %>%
    ungroup() -> top10
DoHeatmap(SO5, features = top10$gene) + NoLegend()
## Warning in DoHeatmap(SO5, features = top10$gene): The following features were
## omitted as they were not found in the scale.data slot for the SCT assay: Ifi47,
## Rpl3-ps1

DimPlot(SO5)

FeaturePlot(SO5,"S100g",split.by = "treatment") #cluster

FeaturePlot(SO5,"Junb",split.by = "treatment") # cluster

FeaturePlot(SO5,"Cxcl10",split.by = "treatment") # cluster

FeaturePlot(SO5,"Pappa2",split.by = "treatment")

DimPlot(SO5,split.by = "treatment")

4 Identifying Clusters

4.1 Cluster 5

# Cxcl10
FeaturePlot(SO5, features = "Cxcl10", split.by = "treatment")

# Ifit1
FeaturePlot(SO5, features = "Ifit1", split.by = "treatment",order = TRUE)

# Isg15
FeaturePlot(SO5, features = "Isg15", split.by = "treatment",order = TRUE)

# Gbp10
FeaturePlot(SO5, features = "Gbp10", split.by = "treatment")

# Ifi47
FeaturePlot(SO5, features = "Ifi47", split.by = "treatment")

4.2 Cluster 4

# Fos
FeaturePlot(SO5, features = "Fos", split.by = "treatment")

# Junb
FeaturePlot(SO5, features = "Junb", split.by = "treatment")

# Egr1
FeaturePlot(SO5, features = "Egr1", split.by = "treatment")

# Fosb
FeaturePlot(SO5, features = "Fosb", split.by = "treatment")

# Zfp36
FeaturePlot(SO5, features = "Zfp36", split.by = "treatment")

4.3 Cluster 2

# Egf
FeaturePlot(SO5, features = "Egf", split.by = "treatment")

# Krt7
FeaturePlot(SO5, features = "Krt7", split.by = "treatment")

# Fabp3
FeaturePlot(SO5, features = "Fabp3", split.by = "treatment")

# Cldn19
FeaturePlot(SO5, features = "Cldn19", split.by = "treatment")

# Tmem52b
FeaturePlot(SO5, features = "Tmem52b", split.by = "treatment")

4.4 Cluster 0

# Mcub
FeaturePlot(SO5, features = "Mcub", split.by = "treatment")

# Aard
FeaturePlot(SO5, features = "Aard", split.by = "treatment")

# Fetub
FeaturePlot(SO5, features = "Fetub", split.by = "treatment")

Observation : MCUB seems to be the highly defined gene in low salt

4.5 Cluster 1

# S100g
FeaturePlot(SO5, features = "S100g", split.by = "treatment")

## Cluster 3

# Leng9
FeaturePlot(SO5, features = "Leng9", split.by = "treatment")

# Subset Cluster 0, 1, 3

# SO6 checking another cluster
SO6<- subset(SO5, idents = c("0","1","3"))

SO6 <- FindNeighbors(SO6, dims = 1:30, verbose = F)
SO6 <- FindClusters(SO6, resolution = 0.1)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8805
## Number of edges: 290708
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9087
## Number of communities: 2
## Elapsed time: 0 seconds
DimPlot(SO6,split.by = "sample")

SO6m <- FindAllMarkers(SO6, only.pos = TRUE)
## Calculating cluster 0
## Calculating cluster 1
SO6m %>%
    group_by(cluster) %>%
    dplyr::filter(avg_log2FC > 0.5)
DimPlot(SO6,split.by = "sample")

DimPlot(SO6,split.by = "sample", group.by = "treatment" )

head(SO6@meta.data)
FeaturePlot(SO6,"Pappa2",split.by = "sample")

4.6 Clusters

# S100g
FeaturePlot(SO6, features = "S100g")

# Aard
FeaturePlot(SO6, features = "Aard")

# Mcub
FeaturePlot(SO6, features = "Mcub")

DimPlot(SO6,group.by = "sample",split.by = "sample")

DimPlot(SO6,group.by = "treatment",split.by = "sample")

My guess is that these are the same, As you go from control to low_salt the cells start to express different genes. How can I test this?

I think the next step after I figure out something with these clusters is to figure out what each of these top genes do, the functions, and purpose of them.

Next Steps: Take a Few steps back and identify S100g Cluster

Multidimensional Dotplot (top genes that are expressed)

Possibly Filter out Rp genes for maybe mt too

Annot objects

Read Janos’ article

Next less : DEG list pathway analysis volcano plot

5 Backtracking

DimPlot(SO5)

# Check S100g so i Have to make this into more clusters.

SO5_new <- RunUMAP(SO5, dims = 1:30, verbose = F)
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
SO5_new <- FindNeighbors(SO5, dims = 1:30, verbose = F)
SO5_new <- FindClusters(SO5, resolution = .4)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 11426
## Number of edges: 376311
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8414
## Number of communities: 9
## Elapsed time: 229 seconds
# back tracking so an old one
p <- DimPlot(SO5)
w <- DimPlot(SO5_new)
p+w

# I thought 5 4 2 are its own cluster
# So in this it would be cluster 
# 8=5 , 4=5 , 2=2 , 
# Cluster 6 should be the s100g cluster 

SO5_m <- FindAllMarkers(SO5_new, only.pos = TRUE)
## Calculating cluster 0
## Calculating cluster 1
## Calculating cluster 2
## Calculating cluster 3
## Calculating cluster 4
## Calculating cluster 5
## Calculating cluster 6
## Calculating cluster 7
## Calculating cluster 8
SO5_m %>%
    group_by(cluster) %>%
    dplyr::filter(avg_log2FC > 1)
SO5_m %>%
    group_by(cluster) %>%
    dplyr::filter(avg_log2FC > 1) %>%
    slice_head(n = 5) %>%
    ungroup() -> top10
DoHeatmap(SO5_new, features = top10$gene) + NoLegend()
## Warning in DoHeatmap(SO5_new, features = top10$gene): The following features
## were omitted as they were not found in the scale.data slot for the SCT assay:
## Ifi47

p+w

Unknown from this is now still 0, 1,3, 4,

markers.to.plot1 <- c(
  "S100g",    # 
  "Atf3",     # 
  "Egr1",     # 
  "Fos",      # 
  "Jun",      #
  "Junb",     #
  "Pappa2",   #
  "Cxcl10",   # 
  "Cldn19",   # 
  "Krt7",     #
  "Egf",      #
  "Aard",
  "Ptger3",
  "Leng9",
  "Ckb",
  "Mcub",
  "Fabp3",
  "Ccn1",
  "Foxq1",
  "Cxcl12",
  "Vash2",
  "Pamr1",
  "Vegfa",
  "Nov"
)

                      
                      
DotPlot(SO5_new,
features = markers.to.plot1,
dot.scale = 8,
dot.min = 0,
scale.max = 100,
scale.min = 0,
col.min = -2.5,
col.max = 2.5)+
coord_flip()
## Warning: The following requested variables were not found: Nov

# I feel like cluster 8 is a contaminate because the it shares a lot of similar genes with my "assumed clusters"

Cluster 0 = Aard, Mcub

Cluster 2 = Egf, Krt7, Cldn19

Cluster 4 = Ptger3

Cluster 5 = Junb, Jun, Fos, Egr1, Atf3

Cluster 6 = S100g

Cluster 7 = Cxcl10

Cluster 3 = could be Leng9 but i feel thats so insignificant.

Cluster 1 = No idea. Missing genes for 1 and 3

6 Subsetting contaminate

SO7 <- subset(SO5_new, idents = "8", invert = TRUE)

SO7 <- RunUMAP(SO7, dims = 1:30, verbose = F)
SO7 <- FindNeighbors(SO7, dims = 1:30, verbose = F)
SO7 <- FindClusters(SO7, resolution = .4)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 11404
## Number of edges: 376163
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8419
## Number of communities: 8
## Elapsed time: 0 seconds
DimPlot(SO7)

DotPlot(SO7,
features = markers.to.plot1,
dot.scale = 8,
dot.min = 0,
scale.max = 100,
scale.min = 0,
col.min = -2.5,
col.max = 2.5)+
coord_flip()
## Warning: The following requested variables were not found: Nov

VlnPlot(SO7, features = "S100g")

VlnPlot(SO7, features = "Cxcl10")

VlnPlot(SO7, features = "Mcub")

VlnPlot(SO7, features = "Aard")

VlnPlot(SO7, features = "Egf")

VlnPlot(SO7, features = "Ptger3")

VlnPlot(SO7, features = "Leng9")

VlnPlot(SO7, features = "Jun")

My assumptions have changed now that I have viewed violin plots.

I think the main clusters are 5, 6, 7, 2

I think 0, 1, 3, 4 are its own thing but just undergoing change?

Specifically I think 1 and 0 are the same and 4 and 3 maybe?

I think cluster 1 turns into 0 in low salt conditions and cluster 4 turns into 3. Both 1 and 3 have Leng9 which is something thats expressed during evolutionary something like that.

According to the Multidimensional dotplot 0 and 1 also look pretty identical minus the expression of a few specific genes but that may jsut be the product of change.

Not sure how to explain 4 and 3 though. # Analyzing

DimPlot(SO7,split.by = "treatment")

DimPlot(SO7,split.by = "sample")

SO1 and SO4 = Control SO2 and SO3 = low_salt

Im guessing, the two Control samples were slightly different. One expressing more in each. They then used induced low salt into each of these two controls types and got SO2 and SO3 but they’re different because of control.

I think it might be a good idea to analyze maybe just SO1 and SO2 to see the change of 4 turning into 3 and analyzing SO4 and SO3 separate to see cluster 1 changing to 0.